Mining the Web for Association Similarity between Concepts

نویسندگان

  • Li Chen
  • Zi-lin Song
  • Ying Zhang
  • Ning Zhou
چکیده

Measures of similarity between two terms or concepts have been widely used in the domain of Natural Language Processing, Semantic Web, and so on. There are mainly two kinds of methods for measuring similarity. One is based on prior manually built taxonomy or Ontology; the other which is usually referred to as the statistical approaches is based on the corpus. However, the ontology-based method has problem of coverage and the corpus-based method has the problem of sparse data. In order to overcome these problems, a huge data source World Wide Web was used to calculate similarity between concepts. The concept similarity was measured using the association rule mining in the snippets returned from Web search engines. The most influential algorithm for association rule mining is Apriori. In order to improve the efficiency of Apriori algorithm and use it to measure the concept similarity, there are three main improvements in Apriori algorithm. The experimental result shows that the algorithm can improve the precise of measuring concept similarity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity measurement for describe user images in social media

Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Prediction of user's trustworthiness in web-based social networks via text mining

In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JSW

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012